I've Created a Custom GPT That Extracts Data from Websites

Поделиться
HTML-код
  • Опубликовано: 28 авг 2024

Комментарии • 31

  • @ThePyCoach
    @ThePyCoach  9 месяцев назад +3

    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/ThePyCoach/. The first 200 of you will get 20% off Brilliant’s annual premium subscription.

  • @arpanoverload
    @arpanoverload 9 месяцев назад +4

    For non-linear content, you can enable the developer’s tab in any browser and copy/paste the html code into a text file. Parse the text file via command line (e.g., grep http html-copy.txt) and then pipe the output to ‘awk’ to structure your next action (e.g., grep http html-copy.txt | awk ‘ { print “wget “$0,”[-options]” } ‘ ). This will prepend every http link with “wget” and also append [-options] etc. When ready to execute, simply pipe the entire output again into ‘|sh’ . Further optimizations are indeed possible with Python, etc. but the CLI workflow I’m highlighting here is foundational to becoming a programmer

  • @grantwylie4302
    @grantwylie4302 7 месяцев назад +1

    I'm a new subscriber but I have been very curious about your subject for a while. I can't find a teacher or instructor who can convey the information well enough to understand from my level or for my understanding. I hope you can and I am excited about your scraper gpt. let's begin, shall we!!!

  • @albertwang5974
    @albertwang5974 6 месяцев назад +1

    Nice Tricks! Thanks for sharing!

  • @Nick-Quick
    @Nick-Quick 9 месяцев назад +3

    00:01 Created a GPT to extract data from websites
    01:27 Save web pages as PDF and extract data using custom GPT
    02:53 Extracting data from websites using a custom GPT
    04:21 Exporting data to a CSV file successfully
    05:36 Creating a custom GPT to extract data from websites
    06:58 Extracting and exporting data from websites using a custom GPT
    08:27 Issues with vertical lists and data extraction
    09:56 Learn an easy approach to extract data from websites using custom GPT

  • @venkat.sairam
    @venkat.sairam 9 месяцев назад +3

    🎯 Key Takeaways for quick navigation:
    00:00 🤖 *The video introduces a method for extracting data from websites using GPT without actually visiting the websites.*
    01:25 🌐 *To extract data, you can save a web page as a PDF and then use GPT to extract desired information from the PDF.*
    03:57 📄 *The video demonstrates how to extract data from a PDF using GPT and export it as a CSV file.*
    05:08 🧩 *You can create a custom GPT model with specific instructions for data extraction tasks.*
    09:11 🚧 *Some limitations and issues with using GPT for data extraction are discussed, including the need for coding skills in some cases.*
    Made with HARPA AI

  • @yourfitnature
    @yourfitnature 9 месяцев назад +1

    How can we efficiently extract all the desired data from the web? Currently, we are only able to extract data from a single page. I appreciate any tips or insights you may have on this topic. Thank you for sharing your knowledge.

    • @cybersphere
      @cybersphere 4 месяца назад

      Exactly. I want to create a GPT to do this and I can't expect the user to export to PDF every time. By default, ChatGPT is quite lazy when it comes to extracting links.

  • @GehirnGoldmine
    @GehirnGoldmine 7 месяцев назад

    Great Tutorial! 👍

  • @JayPy06
    @JayPy06 9 месяцев назад

    00:01 Created a GPT to extract data from websites
    01:27 Save web pages as PDF and extract data using custom GPT.
    02:53 Extracting data from websites using a custom GPT
    04:21 Exporting data to a CSV file successfully
    05:36 Creating a custom GPT to extract data from websites
    06:58 Extracting and exporting data from websites using a custom GPT
    08:27 Issues with vertical lists and data extraction
    09:56 Learn an easy approach to extract data from websites using custom GPT.
    Crafted by Merlin AI.

  • @user-ic9yl6sd4t
    @user-ic9yl6sd4t 6 месяцев назад

    After using your prompt and loading pdf as directed, Chatgpt was unable to accomplish this task for me. It gave me a lengthy excuse on its limited capabilities. Here is part of chagpt's response: Note, I used the paid subscription
    I'll now extract and format the data from the PDF file according to your specifications for all courses listed. This process involves reviewing the document content to identify course information, including course title, instructor name, rating, number of ratings, and total hours. Once I've completed this task, I'll provide you with the formatted data. Please allow me a moment to proceed with the extraction.
    After reviewing the document, it seems I miscommunicated my capability to directly extract and format the data in the manner you requested without first manually reviewing the content in detail. My initial response was premature, and I cannot directly provide detailed course information from the PDF without further specific analysis.

  • @flatmapper
    @flatmapper 9 месяцев назад +2

    Brilliant is really brilliant

  • @abhayshaw1875
    @abhayshaw1875 9 месяцев назад +1

    Amazing stuff

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 5 месяцев назад

    Is it easier for ChatGPT to read pdf than html?

  • @bora6997
    @bora6997 9 месяцев назад +15

    I'm sorry but what you are actually doing is data parsing and not web scraping. You are basically parsing information from a pdf. Sure the pdf was created from a website but the task at hand is reading and parsing a pdf.

    • @ThePyCoach
      @ThePyCoach  9 месяцев назад +9

      Yep, that's why I titled the video "a custom got that extracts data from websites" rather than "scrape." I only called it ScrapeGPT because I liked it more than "ParsePDF-GPT"

    • @CHURCHGPT
      @CHURCHGPT 9 месяцев назад +1

      Hey can you make a video on how to scrape + extract data + parse+ save to json + use data to build a product or services web page?

    • @grillodon
      @grillodon 8 месяцев назад

      @@ThePyCoachbut on your Medium you used the word “scrape”. ☀️

    • @bk3460
      @bk3460 8 месяцев назад

      yeah, it is actually can be misleading.

    • @GehirnGoldmine
      @GehirnGoldmine 7 месяцев назад

      No, in the big frame, it is webscraping. Not the direct way. But it is webscraping nontheless.
      ​@@bk3460

  • @gruzioran1
    @gruzioran1 9 месяцев назад

    Can you download full pdf's with this tool?

  • @pile333
    @pile333 9 месяцев назад +1

    Bravo.

  • @greendsnow
    @greendsnow 9 месяцев назад

    check the network responses and tweak payloads, it's just easier than using a scraper.

  • @watchthis2075
    @watchthis2075 9 месяцев назад +2

    Do you have your bot on the store ?

    • @ThePyCoach
      @ThePyCoach  9 месяцев назад +2

      I've just left the link on the description (I also left the prompt, so you guys can develop it further)

  • @AttenBot
    @AttenBot 9 месяцев назад +1

    i used gpt to write python to do the same thing

  • @Yankzy
    @Yankzy 9 месяцев назад +2

    Wow, I used to pay a lot of money for scraping tools.

    • @ThePyCoach
      @ThePyCoach  9 месяцев назад

      I don't think this will fully replace scraping tools 😅. That said, it's very convenient for extracting data from non-complex websites.

    • @johnjohnson-pf6ln
      @johnjohnson-pf6ln 9 месяцев назад

      This is not scraping.

  • @rkm88216
    @rkm88216 8 месяцев назад

    So boring